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A 


SUMMARY 


Overview 


A series  of  experiments  examined  the  subjective  criteria 
which  people  use  when  evaluating  alternative  explanations  for 
everyday  events.  It  was  found  that  they  typically  rely  on  a 
strategy  which  is  inappropriate  according  to  most  theories  of 
how  one  should  explain  events.  The  implications  of  these  results 
for  the  effective  interpretation  of  events  are  discussed. 

Background 

A key  element  in  command  and  control  tasks  is  evaluating 
alternative  explanations  of  observed  events.  For  example,  "Is 
his  performance  so  lackluster  due  to  lack  of  ability  or  lack  of 
motivation?"  "Is  the  brusqueness  of  his  response  due  to 
rudeness  or  anxiety?"  So  many  such  decisions  must  be  made  in 
a working  day  that  it  is  often  impossible  to  do  the  sort  of 
detailed  data  collection  and  analysis  needed  to  produce  the  best 
possible  information. 

Approach 

The  present  studies  ask  "What  subjective  criterion  do 
people  use  when  forced  to  make  quick  evaluations  of  alternative 
explanations?"  and  subsequently,  "How  appropriate  is  that 
criterion?"  First,  a set  of  possible  criteria  was  identified. 
Second,  a set  of  explanation  situations  was  developed  which 
discriminated  between  those  criteria.  Third,  it  was  determined 
which  of  these  criteria  best  predicted  judgments  of  "better 
explanation. " 


Findings  and  Implications 


The  subjective  judgment  which  best  predicts  which  of  two 
possible  causes  is  judged  to  be  a better  explanation  of  a given 
event  is:  "Which  of  these  two  causes  is  more  likely  to  have 
been  present  given  that  the  event  occurred?"  This  judgment 
measures  the  necessity  of  the  causes  given  the  event.  According 
to  most  (if  not  all)  philosophies  of  science,  one  should,  however, 
be  asking  about  the  sufficiency  of  the  cause  to  produce  the  event, 
i.e.,  "How  likely  is  the  event  given  the  cause?"  Use  of  the 
necessity  criterion  can  severely  restrict  the  extent  to  which 
the  explanation  of  past  events  improves  our  ability  to  predict 
future  events.  Possible  reasons  for  this  bias  and  ways  to 
ameliorate  it  are  discussed. 
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1.  WHAT  MAKES  A GOOD  EXPLANATION? 


Event  E occurs  and  two  possible  causes,  and  C2»  are 
offered.  How  does  one  decide  which  provides  the  better 
explanation? 

The  most  widely  accepted  proposal  for  how  one  should 
make  such  judgments  is  offered  by  the  hypothetico-deductive  or 
covering-law  model  of  explanation  (e.g.,  Hempel , 1965). 

According  to  that  model,  an  event  has  been  explained  when  one 
has  identified  a set  of  initial  conditions,  C^,  and  general 
laws  of  behavior,  L ^ , such  that  P(E|C^,  . ..,  C^,  . ..,  C^; 

L^ , ...»  L j , . ..,  L^)  = 1.  That  is,  an  event  is  explained 
when  one  has  shown  it  to  have  been  inevitable.  Thus,  explanation 
is  a form  of  after-the-fact  prediction  (or  postdiction) . Of  two 
sets  of  initial  conditions,  the  better  explanation  is  provided 
by  the  one  which  makes  E more  likely.  Assuming  that  the  set 
of  laws,  L j , are  the  same  when  considering  and  C2,  the  better 
explanation  is  provided  by  the  cause  for  which  P(E|C^)  is  greater. 

Among  philosophies  of  science,  the  leading  competitor 
to  the  covering-law  model  is  the  coherence  or  colligation 
criterion  (e.g.,  Gallie,  1964),  according  to  which  the  best 
explanation  is  provided  by  the  cause  whose  conjunction  with  the 
event  produces  the  best  story.  Thus,  the  coherence  criterion 
invokes  a qualitative,  quasi-aesthetic  judgment.  It  is  advanced 
most  frequently  by  historians  who  contend  that  the  covering-law 
model  is  appropriate  to  physics,  but  not  to  a profession  dealing 
with  unique  (i.e.,  behavioral)  events.  In  Walsh's  words  (1967), 
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...the  historian's  aim  is  to  make  a coherent  whole  out 
of  the  events  he  studies.  His  way  of  doing  that,  I 
suggest,  is  to  look  for  certain  dominant  concepts  or 
leading  ideas  by  which  to  illuminate  his  facts  to  trace 
the  corrections  between  those  ideas  themselves  and  then 
to  show  how  the  detailed  facts  became  intelligible  in 
the  light  of  them  by  constructing  a 'significant* 
narrative  of  the  events  of  the  period  in  question. 

In  this  respect  the  ideal  of  the  historian  is  in 
principle  identical  with  that  of  the  novelist  or 
dramatist  (p.  61) . 

Belief  in  the  prescriptive  validity  of  these  criteria 
need  not  entail  belief  in  their  descriptive  validity.1  The 
present  study  examines  the  criteria  which  people  invoke  when 
asked:  Is  or  C2  a better  explanation  of  Event  E?  Some 

subjects  were  asked  to  choose  one  of  two  causes  as  the  better 
of  two  explanations  for  each  of  a large  set  of  events.  Other 
groups  judged  the  relationship  between  the  same  causes  and  events 
by  various  criteria  which  have  been  advanced  as  constituting 
the  equivalent  of  "being  a good  explanation,"  among  them  the 
hypothetico-deductive  and  coherence  criteria.  Our  goal  was  to 
discover  which  of  these  alternative  criteria  produces  responses 
most  similar  to  those  elicited  by  the  question,  "which  provides 
a better  explanation?" 

A similar  approach  was  adopted  by  Bear  (1974)  to  validate 
his  contention  that  one  particular  subjective  criterion,  P(E|C^) 
is  equivalent  to  the  "good  explanation"  judgment.  For  five 
items,  he  found  that  if  was  chosen  as  a better  explanation 
than  C2/  then  P^lc^)  was  judged  to  be  greater  than  P(E|C2). 
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Unfortunately,  in  each  cf  his  examples,  one  of  the  two  possiblt- 
causes  had  nothing  at  all  to  do  with  the  event.  Therefore, 
almost  any  reasonable  judgment  [not  just  P(E|C^)]  would 
associate  the  relevant  cause  with  the  event  in  his  examples. 
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2.  EXPERIMENT  1 


2 . 1 Method 


Every  task  involved  an  event,  E,  and  two  possible  causes, 
and  C2*  Each  task  required  subjects  to  make  one  of  eight 
judgments,  each  resulting  in  the  choice  of  either  or  C 
Three  of  these  judgments  were  variants  on  the  question,  "which 
is  a better  explanation  for  E,  or  C^i"  The  remaining  five 
were  other  judgments  which  might  be  the  subjective  equivalent 
of  "better  explanation." 

2.1.1  Explanation  Tasks 

Explanation  1 (both  true) . Before  deciding  which  cause 
makes  a better  explanation,  one  might  want  to  know  whether  both 
or  only  one  of  the  causes  is  true.  One  group  of  subjects  was 
asked  to  assume  that  both  facts  were  true  and  to  choose  that 
which  provided  a better  explanation.  An  example  of  their  task 
read : 


A person  stole  office  supplies  from  his  place  of  work. 

Both  of  the  statements  below  describe  that  person.  Which 
of  the  two  would  be  a more  adequate  explanation  of  this 
action  or  event? 

A.  The  person  was  poor. 

B.  The  person  was  unreliable. 

C.  No  difference  (A  and  B are  equally  good  explanations) . 

Explanation  2 (only  one  is  true;  ignore  likelihood  of 
causes) . If  one  is  told  that  only  one  of  the  facts  reported  in 
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the  two  cause  statements  is  true,  should  one  consider  the 
likelihood  of  each  statement  being  true  when  determining  which 
makes  a better  explanation?  For  example,  one  might  encounter 
a situation  where,  if  true,  would  make  a much  better 
explanation  than  C2  (if  true) ; however,  is  unlikely  to  be 
true.  To  clarify  this  aspect  of  the  explanation  task,  two 
separate  "only  one  is  true"  conditions  were  created. 

Explanation  2 completely  ignored  the  likelihood  of  the  causes: 

A person  stole  office  supplies  from  his  place  of  work. 

Only  one  of  the  following  statements  is  true  about  this 
person.  Which  of  the  two,  if  true,  would  be  a more 
adequate  explanation  of  this  action  or  event? 

A.  The  person  was  poor. 

B.  The  person  was  unreliable. 

C.  No  difference  (A  and  B would  be  equally  adequate 
explanations) . 

Explanation  3 (only  one  is  true;  consider  likelihood  of 
causes) . This  condition  made  explicit  reference  to  the  relative 
likelihood  of  the  causes: 

A person  stole  office  supplies  from  his  place  of  work. 

Only  one  of  the  following  statements  applies  to  this 
person.  Which  of  the  two  is  more  likely  to  be  an  adequate 
explanation  of  this  action  or  event? 

A.  The  person  was  poor. 

B.  The  person  was  unreliable. 

C.  No  difference  (A  or  B is  equally  likely  to  be  an 
adequate  explanation) . 


2.1.2  Alternative  Criteria 


Prospective  probabilities  P(E[C^).  Five  alternative 
criteria  were  used.  The  first,  called  prospective  probabilities, 
operationalizes  the  hypothetico-deductive  criterion: 

Assume  that  two  people  have  been  described  to  you. 

Person  A:  Was  poor 
Person  B:  Was  unreliable 

Knowing  what  you  know  about  people,  which  of  these  two 
is  more  likely  to  steal  office  supplies  from  his/her 
place  of  work? 

Person  A 

Person  B 

No  difference  (equally  likely) . 

Retrospective  probabilities  P(C^|E).  The  second  criterion, 
retrospective  probabilities,  has  not  been  proposed  as  a 
prescriptively  valid  rule.  We  include  it  as  the  result  of 
speculating  that  people  may  ask  "Which  cause  is  more  likely 
given  the  event?"  rather  than  "Which  cause  makes  the  event  more 
likely?"  That  is,  perhaps  people  look  at  P(C^|E)  rather  than 
P (E|C . ) . P (E|C • , or  prospective  probabilities,  captures  the 
sufficiency  of  the  cause  for  the  event.  P(C^|E),  or 
retrospective  probabilities,  captures  the  necessity  of  the 
cause  given  the  event.  Although  use  of  such  a criterion  would 
violate  the  "rational"  covering-law  model,  it  might  still  be 
an  accurate  description  of  what  people  do.  It  suggests,  for 
example,  a detective  trying  to  work  backward  from  events  to 


causes . 


A person  steals  office  supplies  from  his  place  of  work. 

Is  this  person  more  likely  to 

A.  Have  been  poor. 

B.  Have  been  unreliable. 

C.  No  difference  (A  and  B are  equally  likely). 

Coherence.  The  third  criterion  was  an  operationalization 
of  coherence.  One  reason  for  considering  coherence  as  the 
criterion  for  explanatory  adequacy,  other  than  some  historians' 
insistence  that  it  should  be,  is  the  great  difficulty  that  people 
have  with  probabilistic  inference  tasks  (Tversky  and  Kahneman, 
1974).  A non-probabilistic  criterion  asking  for  a quasi-aesthetic 
judgment  might  thus  be  very  attractive  to  such  individuals: 

Which  of  the  following  makes  for  a more  coherent  short 

narrative  (i.e.,  which  is  more  thematically  unified, 

which  sticks  together  better)? 

A.  A person  who  was  poor  stole  office  supplies  from 
his  place  of  work. 

B.  A person  who  was  unreliable  stole  office  supplies 
from  his  place  of  work. 

C.  No  difference  (both  narratives  are  equally  coherent). 

Representativeness . Another  deterministic  criterion 
suggested  by  the  literature  on  probabilistic  inference  is 
judgment  by  representativeness.  Kahneman  and  Tversky  (1973) 
have  shown  that  "intuitive  predictions  follow  a judgmental 
heuristic  — representativeness.  By  this  heuristic,  people 
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predict  the  outcome  that  appears  most  representative  of  the 
evidence"  (p.  237) . Representativeness  is  thus  a judgment 
of  "fit"  or  similarity  between  the  outcome  predicted  and  the 
situation  out  of  which  it  arises.  In  predictive  tasks,  judgment 
by  representativeness  has  been  shown  to  produce  some  important 
judgmental  biases.  As  a criterion  for  explanation,  it  might 
suggest  that  judgments  such  as  "that's  just  like  him" 
constitute  adequate  explanations.  One  such  example  might  be 
Lerner's  (1970)  "just  world"  theory,  according  to  which  people 
attribute  bad  outcomes  to  the  badness  of  the  people  to  whom 
they  happen.  An  example  of  this  task  is: 

A person  stole  office  supplies  from  his  place  of  work. 

Is  this  action  or  event  more  fitting  and  appropriate 
for : 


A.  Someone  who  was  poor. 

B.  Someone  who  was  unreliable. 

C.  No  difference. 

Availability . The  final  nonprobabilistic  criterion  is 
also  suggested  by  a Tversky  and  Kahneman  (1973)  heuristic, 
"availability,"  according  to  which  an  event  seems  likely  to 
the  extent  that  it  is  easy  to  imagine  how  it  could  happen. 
Perhaps  a cause  explains  an  event  according  to  the  ease  which 
which  the  gap  between  and  E is  filled,  that  is,  according 

to  how  well  the  cause  enables  the  explainer  to  see  how  the 
event  happened: 
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A person  stole  office  supplies  from  his  place  of  work. 
Can  you  more  readily  see  how 

A.  Someone  who  was  poor 

B.  Someone  who  was  unreliable 

would  come  to  take  this  action  or  have  this  happen  to  him  or  is 
there 


C.  No  difference. 

No  claim  is  made  that  these  five  alternative  criteria 
are  distinct.  Indeed,  it  will  shortly  be  seen  that  they  often 
coincide:  causes  that  combine  with  events  to  make  coherent 

narratives  often  make  the  event  seem  likely  and  seem  likely 
given  the  event.  We  know,  however,  too  little  about  the 
relationships  between  these  five  criteria  to  be  able  to  specify 
in  advance  just  how  they  will  differ.  For  example,  Tversky 
and  Kahneman  do  not  specify  whether  representativeness  or 
availability  will  be  invoked  in  situations  where  both  heuristics 
are  applicable.  The  first  step  in  this  investigation  was  to 
discover  situations  in  which  these  criteria  produce  differing 
judgments.  We  could  then  ask,  to  the  extent  that  they  do 
differ,  which  alternative  is  the  best  predictor  of  judgments 
of  explanatory  adequacy? 

2.1.3  Item  Development.  Contrasting  these  eight  judgmental  tasks 
requires  a set  of  items  which  discriminates  between  them.  Items  were 
written  which  seemed  to  us  to  discriminate  between  one  or  more  pairs 


of  judgments.  If,  after  being  presented  to  subjects,  an  item 
did  discriminate  between  tasks,  it  was  kept;  if  not,  it  was 
dropped . 

Items  were  presented  to  two  waves  of  subjects.  The  first 
wave  consisted  of  8 groups  (one  per  criterion) , each  of  which 

2 

judged  the  same  39  items.  Of  these  items,  19  were  eliminated 
because:  (a)  a majority  of  all  subjects  chose  the  same 

alternative  (A  or  B)  in  each  of  the  8 conditions,  and  (b)  the 
proportion  of  subjects  selecting  A was  not  significantly 
different  in  any  two  conditions,  as  determined  by  a multiple 
range  test  (alpha  = .20).  The  remaining  20  items  were  combined 
with  an  additional  20  items  and  presented  to  a second  wave  of 
8 groups.  Three  of  the  old  items  and  7 of  the  20  new  items 
were  eliminated  from  wave  2 because  of  universal  agreement  (as 
defined  by  [a]  and  [b]  above) . The  remaining  30  items  were 
used  to  determine  the  degree  of  agreement  between  the  different 
judgments . 

Explanation  1 (both  causes  true)  is  not  applicable  to 
items  for  which  the  two  possible  causes  are  contradictory.  Such 
items  are  usable  on  the  other  conditions  and  some  were  included. 
Fifteen  items  in  the  original  set  for  wave  1 and  seven  items  in 
the  set  remaining  after  wave  2 were  of  this  type.  Analyses 
referring  to  Explanation  1 involve  only  the  items  with  non- 
contradictory causes. 

2.1.4  Subjects . A total  of  282  paid  subjects  participated  in  the 
two  waves  of  the  experiment.  They  were  recruited  by  advertisements 


in  the  University  of  Oregon  student  paper.  Approximately  15 
subjects  were  in  each  of  the  eight  conditions  in  wave  1 (range: 
14-17) ; approximately  10  saw  each  type  of  questionnaire  in  wave 
2 (range:  18-21) . 

2.1.5  Procedure . Questionnaires  were  completed  in  self-paced 
groups  of  30  to  40  subjects,  with  the  8 forms  being  distributed 
unsystematically  among  the  subjects  in  each  group.  After  judging 
the  39  items,  wave  1 subjects  evaluated  their  task  on  7-point 
rating  scales  for  interest,  familiarity,  ambiguity  and  degree 
of  challenge. 

2.2  Results 


2.2.1  Measures . Items  were  assigned  two  scores:  A/All  = the 
number  of  subjects  selecting  alternative  A divided  by  the  total 
number  of  subjects  in  a group;  and  A/AB  = the  number  of  subjects 
selecting  alternative  A divided  by  the  number  of  subjects 
selecting  A or  B (and  not  C) . A/All  gives  the  overall  popularity 
of  alternative  A;  A/AB  gives  its  popularity  among  subjects  who 
could  decide  between  A and  B.  If  analyses  based  on  both  of 
these  scores  are  not  reported,  it  can  be  assumed  that  the  same 
conclusions  were  reached  for  the  unreported  score. 

There  were  substantial  differences  between  subjects  in 
ability  (or  willingness)  to  choose  between  A and  B.  For  each 
of  the  8 tasks  there  was  a bimodal  distribution  for  the  number 
of  C (no  difference)  responses  per  subject,  with  most  subjects 
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giving  C for  10-20%  of  the  items  and  a minority  giving  C for 
about  half.  There  were  also  substantial  differences  in  the 
proportion  of  C responses  per  item.  In  wave  1 that  proportion 
ranged  from  .12  to  .54.  However,  the  proportion  of  C responses 
did  not  differ  greatly  across  the  eight  tasks  (range:  .19  to  .26). 

The  seven-point  interest  scales  administered  at  the  end 
of  wave  1 showed  few  large  differences  among  the  eight  tasks. 
However,  prospective  probabilities  were  clearly  viewed  most 
favorably,  with  the  highest  mean  rating  on  "interesting," 

"not  ambiguous"  and  "familiar,"  and  the  second  highest  rating 
on  "challenging."  Availability  received  the  worst  rating  on 
"interesting,"  "challenging"  and  "not  ambiguous,"  although  it 
was  slightly  above  average  on  "familiar." 

2.2.2  Overlap.  Even  though  the  items  were  designed  to  discriminate 
between  the  conditions,  there  was  still  great  similarity  in 
responses  across  the  various  conditions.  The  fact  that  only 
30  to  59  items  remained  after  wave  2 is  one  indication  of  this 
overlap.  The  correlations  among  the  proportions  of  A responses 
for  the  different  tasks,  across  all  items  used  in  wave  1, 
including  those  eventually  deleted,  were  very  high.  They 
ranged  from  .52  to  .93  with  medians  of  .83  for  A/All  and  .80 
for  A/AB.  Principal  components  factor  analyses  revealed  that 
about  80%  of  the  variance  in  these  scores  could  be  accounted 
for  by  one  factor.  Although  all  variables  loaded  highly  on 
this  one  factor,  prospective  probabilities  had  the  lowest 
loading  for  both  A/All  and  A/AB. 
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Table  2-1  shows  the  intercorrelations  among  the  tasks  for 
the  30  discriminating  items  remaining  after  wave  2.  These 
correlations  were  based  on  A/AB.  Results  based  on  A/All,  which 
correlated  .93  with  A/AB,  were  essentially  the  same. 

Although  the  intercorrelations  were  high,  they  were  not 
as  high  as  the  correlations  based  on  all  items  used  in  wave  1; 
the  median  correlation  in  Table  2-1  is  .58.  In  addition,  several 
clear  patterns  emerge.  The  three  explanation  judgments  were 
highly  similar  to  one  another.  Multiple -range  tests  (using 
alpha  = .20  as  a cutoff)  revealed  only  two  cases  among  the  30 
items  for  which  any  of  the  explanation  tasks  differed  from  one 
another.  Either  subjects  do  not  distinguish  between  these 
tasks  or  our  items  and  procedure  failed  to  elicit  whatever 
distinctions  they  can  make.  Whatever  the  case,  in  the  present 
data  these  three  conditions  are  best  treated  as  one. 

Among  the  five  alternative  criteria,  two  clusters  emerged. 
One  involved  prospective  probabilities  and  coherence,  and  the 
second  involved  retrospective  probabilities,  representativeness 
and  availability.  The  mean  correlation  (using  Fisher's  Z) 
between  variables  within  the  same  cluster  was  .66.  The  mean 
correlation  between  variables  in  different  clusters  was  .50. 

This  clustering  was  supported  by  the  multiple-range  test  analyses. 
Of  29  disagreements  (alpha  = .20)  between  pairs  of  five 
alternative  criteria,  26  involved  criteria  belonging  to 
different  clusters.  Apparently,  the  coherence  judgment  evokes 
a forward-looking  perspective  like  that  of  prospective 
probabilities,  whereas  representativeness  and  availability 
induce  people  to  look  backward  from  events  to  antecedents  in 
the  manner  of  retrospective  probabilities. 
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CORRELATIONS  BETWEEN  JUDGMENT  TASKS  FOR  PROPENSITY 


Of  these  two  clusters,  the  one  involving  retrospective 
probabilities  was  more  highly  related  to  the  explanation 
judgments.  The  mean  correlation  between  members  of  this 
cluster  and  the  three  explanation  judgments  was  .77,  while 
the  mean  correlation  between  explanation  and  the  members  of 
the  prospective  probabilities  cluster  was  .37.  The  difference 
between  the  two  sets  of  correlations  was  highly  significant 
(z  = 3.18,  Mann-Whitney  U test).  Looking  at  the  multiple-range 
test,  of  30  instances  in  which  judgments  in  an  explanation  task 
differed  significantly  from  those  in  an  alternative  criterion, 

15  involved  the  remaining  three  criteria  (4  with  retrospective 
probabilities  and  2 with  availability).  Thus,  80%  of  the 
disagreements  were  with  the  prospective  probability  cluster. 
Principal  components  factor  analyses  performed  on  the  matrices 
in  Table  2-2  showed  two  factors  accounting  for  about  80%  of  the 
variance.  One  factor  had  large  loadings  for  prospective 
probabilities  and  coherence;  the  other  factor  showed  large 
loadings  for  the  remaining  6 variables. 

2.2.3  Individual  Events.  To  help  clarify  the  nature  of  the 
difference  between  these  clusters.  Table  2-3  presents  the  4 items 
which  produce  the  greatest  overall  multiple-range  test  effect. 
Responses  are  pooled  across  the  tasks  in  each  cluster.  For  example, 
22  Explanation  1 subjects,  21  Explanation  2 subjects,  and  22 
Explanation  3 subjects  picked  being  unmarried  as  the  best 
explanation  for  Event  1,  producing  the  65  in  the  upper  left  hand 
corner.  For  all  the  examples  shown  in  Table  2-3,  the  cause 
labeled  A is  the  cause  most  often  chosen  as  the  best  explanation 
and  most  often  chosen  by  the  subjects  in  the  retrospective 


NOTE:  In  each  pair,  first  cause  is  more  necessary,  second  cause  is  more 

suf  f icient. 


a Pool across  the  three  explanation  tasks. 

tj  Pooled  across  the  rotrosoect ive  orobabilitv,  reoresentat i veness  and  availability  tasks 
c Pooled  across  the  nrosoective  orobabilitv  and  coherence  tasks. 


cluster  of  tasks.  In  our  judgment,  this  cause,  A,  also  has  the 
highest  base  rate;  that  is,  there  are  more  unmarried  people  in 
the  general  population  than  shy,  easily  embarrassed  recluses, 
and  there  are  more  upset  Republicans  than  undercover  FBI  agents, 
etc. 

2.4  Discussion 


To  the  extent  that  the  five  alternative  criteria  produce 
differing  judgments,  they  fall  into  two  clusters,  one  involving 
prospective  probabilities  and  one  involving  retrospective 
probabilities.  Judgments  of  explanatory  adequacy  are  considerably 
closer  to  judgments  of  retrospective  probability  than  to 
prospective  probability.  Another  way  of  viewing  this  result  is 
as  follows:  When  P(A/E)  > P(B/E),  A constitutes  a more  necessary 
antecedent  of  E.  When  P(E/A  > P(E/B),  A constitutes  a more 
sufficient  antecedent  of  E.  The  retrospective  probability 
condition  elicits  a judgment  of  relative  necessity,  while  the 
prospective  probability  condition  elicits  a judgment  of  relative 
sufficiency.  To  the  extent  that  judgments  of  necessity  and 
sufficiency  diverge,  our  subjects  allowed  necessity  to  guide 
their  judgments  of  explanatory  adequacy. 

Of  course,  given  the  high  intercorrelations  within  the 
clusters,  it  would  be  just  as  true  to  say  that  when  judgments 
of  coherence  (a  member  of  the  prospective  probability  cluster) 
and  availability  (a  member  of  the  retrospective  probability 
cluster)  diverge,  the  latter  provides  a better  predictor  of 
explanatory  adequacy.  At  the  moment,  we  cannot  tell  whether 
judgments  of  retrospective  probability,  representativeness  and 
availability  can  ever  be  reliably  distinguished  or  which  is 
the  "real"  surrogate  of  "better  explanation."  We  will,  however, 


use  the  terms  "necessity"  and  "sufficiency"  to  identify  the 
prospective  probability  and  retrospective  probability  clusters. 

Why,  when  forced  to  choose,  do  people  prefer  the  more 
necessary  to  the  more  sufficient  explanation? 

In  the  discussion  of  Table  2-3,  we  noted  that  in  each  of 
the  four  examples,  the  more  necessary  cause  appeared  to  us  to 
be  more  likely.  Indeed,  it  can  be  shown  that  when  P(C^|E)  > PfCjlE) 
P(E|C^)  < P ( E | C ^ ) then  P(C^)  > PIC^).  That  is,  if  one  of  two 
causes  is  more  necessary  and  one  is  more  sufficient,  then  the 
more  necessary  cause  must  be  more  likely.  Perhaps  people  prefer 
not  to  invoke  unlikely  causes  as  explanations.  A normative 
justification  for  this  policy  would  be  that  people  are  interested 
in  the  overall  (bidirectional)  "correlation"  between  causes  and 
events,  perhaps  in  keeping  with  Kelley's  (1973)  covariation 
principle.  The  prospective  probability  and  retrospective 
probability  criteria  provide  unidirectional  measures  of  association, 
expressing  the  predictability  of  events  from  cases  and  causes 
from  events,  respectively.  A bidirectional  measure  of  association, 
say  4>  or  \ , would  reflect  both  of  these  aspects.  Consider  such 
a correlation  coefficient  applied  to  2 x 2 contingency  tables 
whose  rows  are  labeled  "cause  i present"  and  "cause  i absent" 
and  whose  columns  are  labeled  "event  occurs"  and  "event  does 
not  occur."  When  such  tables  are  constructed  for  each  of  the 
two  causes  proposed  for  a given  event  (as  shown  in  Table  2-2)  , both 
tables  have  the  same  column  marginals.  However,  row  marginals 
differ,  reflecting  the  base  rates  of  the  various  causes.  In 
general,  the  more  extreme  the  row  marginals,  the  smaller  the 
bidirectional  correlation  between  cause  and  event  will  be. 

To  recapitulate:  where  relative  necessity  and  relative 
sufficiency  diverge,  the  more  sufficient  cause  will  tend  to  be 
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less  likely  and  to  have  a lower  correlation  with  the  event 
(unless  the  more  necessary  cause  is  extremely  likely) • Our 
subjects  may  have  chosen  the  more  necessary  cause  because  they 
were  relying  on  some  intuitive  measure  of  bidirectional 
correlation. 

Experiment  2 asked  whether  subjects  would  choose  more 
necessary  causes  when  they  were  not  more  highly  correlated  with 
the  event. 
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3.  EXPERIMENT  2 


3 . 1 Method 

3.1.1  Design . Four  groups  of  subjects  participated  in 
Experiment  2.  Each  answered  questions  pertaining  to  the  four 
items  in  Table  2-3.  Group  A was  given  the  four  items  in  Table  2-3 
and  asked  to  fill  in  2 x 2 tables  relating  each  of  the  causes  to 
the  event.  Specifically,  they  filled  in  each  cell  with  the 
number  of  people  out  of  100  who  would  fit  that  description. 

For  example,  one  cell  for  item  1 was  (lives  alone,  is  unmarried). 
Once  completed,  these  contingency  tables  can  be  used  to  test  the 
speculations  about  correlation  and  likelihood  presented  in  the 
above  discussion.  After  filling  in  both  2x2  tables  for  an  item, 
subjects  selected  the  cause  which  provided  a better  explanation. 
Aside  from  providing  a replication  of  Experiment  1,  these  choices 
can  be  related  to  the  chosen  causes'  relative  necessity, 
sufficiency,  likelihood  and  correlation  with  event.  Although 
promising  in  design,  this  task  is  quite  difficult  for  subjects. 

A less  demanding  way  to  unconfound  probabilities  and 
correlations  is  to  specify  them.  Groups  B and  C saw  2x2 
contingency  tables  relating  each  event  to  each  possible  cause. 
These  tables  were  filled  in  so  that:  (a)  one  cause  was  more 
necessary;  (b)  the  other  was  more  sufficient;  and  (c)  the  more 
sufficient  cause  also  bore  a higher  correlation  with  the  event 
than  did  the  more  necessary  cause.  After  studying  the 
information  in  the  tables,  subjects  selected  the  cause  which 
provided  the  best  explanation.  They  were  also  asked  to  explain 
their  choices.  Group  B saw  the  actual  tables;  Group  C received 
a verbal  description  of  their  contents.  Correlations  were 
measured  by  $ (phi).  Since  there  is  no  guarantee  that  subjects' 


perceptions  of  these  correlations  will  correspond  to  their 
value  according  to  $ , a fourth  group  (D)  received  the  2x2 
tables  with  neutral  labels  and  a description  of  what  correlations 
are.  They  were  asked  to  estimate  the  correlations  they  saw. 

3.1.2  Stimuli . Table  2-2  presents  the  2x2  tables  shown  to 
subjects  in  Groups  B,  C,  and  D.  For  each  item,  one  cause  is 
more  necessary,  while  the  other  is  more  sufficient;  the  more 
sufficient  cause  is  more  highly  correlated  with  the  event. 

Since  the  more  sufficient  cause  is  always  relatively  unlikely, 
yielding  extreme  marginals,  most  phi  values  are  fairly  small. 

3.1.3  Procedure . Instructions  were  straightforward  and  are 
available  upon  request.  Care  was  taken  not  to  overwhelm 
subjects  (particularly  in  Group  A)  and  not  to  emphasize  either 
the  information  relevant  to  necessity  or  that  relevant  to 
sufficiency.  Group  A instructions  were  supplemented  with  a 
blackboard  demonstration  showing  how  to  fill  in  a sample 
contingency  table.  Group  D's  instructions  included  four 
labeled  2x2  tables,  showing  correlations  of  1.0,  .00,  .14, 
and  .78.  In  order  to  give  a concrete  context,  both  these 
instructional  tables  and  those  on  which  subjects  were  tested 
were  labeled  rain/no  rain  for  the  columns  and  cloud  seeding/no 
cloud  seeding  on  the  rows.  After  the  instructions,  the  eight 
contingency  tables  shown  in  Table  2-2  were  presented,  along  with 
two  additional  tables  showing  correlations  of  .00  and  1.0. 

These  were  included  to  test  subjects'  understanding  of  the 
instructions . 

3.1.4  Subjects . One  hundred  and  seven  subjects  were  recruited 
as  in  Experiment  1. 


C 
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3.2 


Results 


3.2.1  Group  D.  The  contingency  tables  used  in  Groups  B and  C 
were  designed  so  that  the  more  sufficient  cause  bore  a higher 
correlation  with  the  event.  Group  D was  asked  to  judge  these 
correlations  in  a neutral  context  in  order  to  see  if  the 
ordering  of  calculated  correlations  was  also  the  ordering  of 
perceived  correlations.  Fourteen  of  18  subjects  correctly 
assigned  a value  of  C to  the  added  table  showing  no  correlation 
and  1.0  to  the  table  shown  full  correlation;  data  from  the  other 
four  subjects  were  deleted.  Table  3-1  presents  relevant  data  for 
the  14  remaining  subjects.  For  items  1,  2,  and  3,  the  majority 
of  subjects  assigned  a higher  subjective  correlation  to  the 
table  which  bore  a higher  computer  correlation.  In  each  case, 
the  magnitude  of  the  differences  was  statistically  significant. 

For  item  4,  the  two  correlations  were  seen  as  equally  large 
(seven  subjects  viewed  each  as  larger) . If  we  assume  that 
subjects  in  Groups  B and  C attributed  similar  subjective 
correlations,  then  they  saw  the  more  necessary  cause  as  having 

a lower  correlation  with  the  event  for  items  1,  2,  and  3,  and 
an  equal  correlation  in  item  4.  Thus,  item  4 provides  a valid 
but  less  stringent  test  of  their  preference  for  necessity. 

3.2.2  Groups  B and  C.  Table  3-2  shows  the  number  of  subjects  who, 
after  seeing  the  statistical  cause-event  data,  picked  each 
possible  cause  as  providing  the  better  explanation.  Responses 

of  those  who  saw  the  tabular  version  (Group  B)  and  of  those  who 
saw  the  verbal  version  (Group  C)  were  quite  similar  and  are 
discussed  together.  As  in  Experiment  1,  subjects  consistently 
thought  that  the  more  necessary  cause  (the  first  in  each  pair) 
provided  a better  explanation  than  the  more  sufficient  (and 
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more  highly  correlated)  cause  (p  < .002  in  each  case;  sign 
test).  The  proportion  of  "no  difference"  responses  was,  however, 
higher  here  (.29)  than  for  the  same  four  items  in  Experiment  1 
(.16).  Among  subjects  who  did  choose  between  A and  B,  the 
proportion  selecting  A (i.e.,  A/AB)  was  quite  similar  in  the 
two  experiments,  for  each  of  the  4 items  (p  > .10)  and  overall 
(.78  for  Experiment  1 vs.  .79  for  Experiment  2). 

Most  subjects  who  chose  the  more  necessary  cause  explained 
that  choice  by  some  version  of  the  retrospective  probability 
criterion;  most  who  chose  the  more  sufficient  cause  invoked 
the  prospective  probability  criterion.  A few  subjects  used 
both  of  these  criteria  (on  different  questions) . A few  subjects 
selected  the  more  necessary  cause  because  it  was  generally 
more  likely  (i.e.,  not  just  as  an  antecedent);  on  item  3 two 
subjects  invoked  the  sophisticated  argument  that  FBI  agents 
are  so  rare  that  one  cannot  trust  statistics  on  their  prevalence 
and  behavior.  Several  subjects  invoked  substantive  reasons 
which  ignored  the  tabled  statistics;  a few  gave  incoherent 
responses . 

3.2.3  Group  A.  Composing  contingency  tables  proved  to  be 
quite  difficult:  on  each  item,  two  to  four  subjects  (of  33) 
failed  to  complete  the  task  correctly;  their  data  were  deleted. 
About  5%  of  the  tables  contained  values  which  seemed 
substantively  unreasonable  (e.g.,  68%  of  the  population  are 
computer  programmers) . Although  dubious,  such  responses  were 
retained.  Subjects  found  no  difference  between  the  two  causes 
in  explanatory  adequacy  39%  of  the  time,  a considerably  higher 
rate  than  previous  groups  and  possibly  another  expression  of 
confusion.  Because  of  these  dif f iculities , we  would  view  the 
following  results  with  some  reservation. 


Table  3-3  presents  mean  responses.  Although  none  of  these 
contingency  tables  matches  those  shown  to  Groups  B and  C, 
the  patterns  are  sufficiently  similar  to  suggest  that  the  values 
shown  those  groups  were  plausible.  Table  3-3  differs  from  Table 
2-2  primarily  in  the  inflated  values  given  by  Group  A to  events 
which  were  quite  rare  in  the  tables  presented  to  Groups  B and  C 
(e.g.,  Groups  B and  C were  told  that  there  were  1%  FBI  agents 
in  the  population  of  item  3;  Group  A estimated  10%,  suggesting 
either  unreasonably  high  responses  or  a generally  high  level 
of  paranoia) . 

For  each  of  the  four  items,  the  more  necessary  cause  (as 
measured  by  P (C ^ | E ) ) according  to  these  group  judgments,  was 
the  same  as  in  Groups  B and  C.  These  were  also  the  causes 
judged  to  be  more  necessary  in  Experiment  1.  However,  in 
contrast  with  Experiment  1 and  with  Groups  B and  C,  the  two 
possible  causes  have  roughly  similar  sufficiency  (as  measured 
by  P(E|Ci) ) . Thus,  the  less  necessary  cause  was  judged  equally, 
not  more,  sufficient.  These  group  results  are  borne  out  in  the 
tables  produced  by  individual  subjects  (see  Table  3-4).  The  cause 
designed  to  be  more  necessary  almost  always  was  so;  it  was  also 
judged  more  likely.  The  two  causes  were,  however,  similar  in 
their  sufficiency.  Table  3-4  also  shows  that  Group  A subjects 
agreed  with  other  groups  regarding  which  cause  provides  a 
better  explanation.  Were  subjects  guided  by  considerations  of 
sufficiency,  they  presumably  would  have  found  no  difference 
in  these  paired  causes  regarding  explanatory  adequacy.  The 
equal  sufficiency  of  these  causes  does,  however,  make  the 
present  evidence  for  reliance  on  necessity  weaker  than  that 
found  elsewhere. 


MEAN  RESPONSES  IN  CONTINGENCY  TABLES  COMPLETED 
BY  GROUP  A,  EXPERIMENT  2 


Works  as  Does  not 

programmer  work  as  Phi 

programmer 


Orderly 

14.0 

41.4 

. 17 

Not  orderly 

5.2 

39.4 

"Star  Trek” 

fan 

9.9 

30.1 

.11 

Not  a "Star 

Trek"  fan 

9.3 

50.6 

t 


c 
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EVALUATION  OF  CAUSES  BY  GROUP  A,  EXPERIMENT 
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NOTE:  Subjects  directly  indicated  which  cause  provided  the  better  explanation. 

The  other  three  judgments  were  inferred  from  the  values  in  individual 
subjects'  contengency  tables. 


4.  DISCUSSION 


In  general,  a possible  cause  that  is  judged  to  be  a good 
explanation  has  a lot  going  for  it.  It  is  realtively  likely 
given  the  occurrence  of  the  event;  it  makes  the  event  relatively 
likely;  it  forms  a coherent  narrative  when  combined  with  the 
event;  it  "fits"  the  event;  it  makes  it  easy  to  see  how  the 
event  occurred.  That  is,  each  of  these  five  criteria  produces 
judgments  similar  to  those  elicited  by  the  other  four  and  by 
the  question  "how  adequate  an  explanation  is  this?" 

Our  efforts  to  produce  items  which  discriminated  between 
these  judgments  were  only  partially  successful.  Three  ways  of 
phrasing  the  question  "which  is  a more  adequate  explanation?" 
produced  nearly  identical  responses.  Possibly,  the  formal 
distinctions  between  these  criteria  have  no  subjective  relevance. 
The  main  difference  between  Explanations  2 and  3 is  that  the 
former  makes  no  mention  of  the  overall  likelihood  (base  rate) 
of  the  causes.  Some  of  the  results  from  Experiment  2,  however, 
suggest  that  overall  likelihood  may  be  hard  to  ignore  in  hunting 
for  explanations.  Explanations  1 and  3 differ  in  whether  one 
or  both  of  the  possible  causes  was  in  fact  present.  Research 
by  Kelley  (1973)  and  Shaklee  and  Fischhoff  (1977)  has  shown 
that  when  one  possible  cause  is  known  to  have  been  present,  people 
tend  to  discount  the  involvement  and  presence  of  the  other. 

Thus,  this  distinction,  too,  may  have  been  subjectively  muddled. 

It  remains  to  be  seen,  though,  whether  other  items  or  elicitation 
precedures  will  also  produce  similar  judgments  for  these  three 
explanation  questions.  For  example,  although  the  formal 
difference  between  Explanation  2 and  Explanation  3 may  be 
substantial,  the  questionnaires  used  here  differed  in  but  a 
word  or  two. 
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Three  other  judgments  which  produced  similar  responses 
were  retrospective  probabilities,  representativeness  and 
availability.  Apparently,  people  engage  in  similar  thought 
processes  when  asked  how  likely  a cause  is  given  an  event,  how 
well  the  event  fits  the  cause  and  how  readily  one  can  see  how 
the  event  occurred  given  the  cause.  These  three  criteria  were 
called  the  "necessity"  cluster  because  the  retrospective 
probability  criterion  is  a measure  of  the  necessity  of  a 
cause  given  an  event. 

A distinct  "sufficiency"  cluster  included  prospective 
probabilities  and  coherence.  The  similarity  of  prospective 
probabilities  and  coherence  suggests  that  the  latter  question 
elicits  a sort  of  forward-looking  perspective.  Even  though 
these  two  criteria  are  often  contrasted  in  philosophical 
discussions , as  presently  operationalized,  they  did  not  differ 
subjectively.  The  fact  that  neither  was  a very  good  predictor 
of  judgments  of  explanatory  adequacy  in  the  items  remaining 
after  wave  2 suggests  that  neither  embodies  all  of  what  people 
do  "naturally"  when  they  evaluate  possible  explanations. 

To  the  extent  that  judgments  of  necessity  and  sufficiency 
diverged,  the  former  was  a much  better  predictor  of  explanatory 
adequacy . 

Experiment  2 replicated  the  choice  of  better  explanation 
found  in  Experiment  1.  It  also  showed  that  the  choice  was  not 
due  to  there  being  a higher  correlation  between  the  more  necessary 
cause  and  the  event.  Subjects  preferred  the  more  necessary 
cause  even  when  it  bore  a lower  correlation,  as  measured  both 
by  $ and  by  subjects'  intuitive  correlation  coefficient  (Group  D) . 


If  people  do  base  their  judgments  of  explanatory  adequacy  on  the 
degree  of  covariation  between  causes  and  events,  they  measure 
covariation  by  an  asymmetrical  measure.  That  is,  they  are  more 
concerned  with  the  predictability  of  causes  from  events  than 
of  events  from  causes. 

Because  of  the  fact  that  when  necessity  and  sufficiency 
diverge,  the  more  necessary  cause  must  be  the  more  likely,  we 
cannot  exclude  the  possibility  that  people  simply  prefer  to  use 
more  likely  causes  in  their  explanations.  Although  many 
investigators  have  found  that  people  often  ignore  base  rate 
information  (Kahneman  and  Tversky,  1973;  Nisbett  and  Borgida, 
1975) , several  recent  studies  have  shown  that  such  information 
can  influence  judgment  if  it  is  perceived  to  have  causal 
significance  (Ajzen,  1977;  Tversky  and  Kahneman,  1977).  The 
present  base  rate  information  was  designed  to  be  causally 
relevant.  Carried  to  the  extreme,  however,  such  reliance  on 
likely  causes  seems  implausible;  for  example,  "X  was  breathing" 
would  then  be  a widely  invoked  cause. 

An  important  question  for  future  research  is  what  are  the 
conditions,  if  any,  under  which  people  will  prefer  more 
sufficient  causes  to  more  necessary  causes  as  explanations. 
Philosophers  have  discussed  at  length  how  an  explainer's  goals 
can  change  the  explanations  he  or  she  finds  adequate  (e.g., 
Achinstein,  1977).  No  one  knows  which  of  these  often  subtle 
distinctions  has  psychological  reality.  One  speculation  is 
that  people  will  prefer  more  sufficient  explanations  when  they 
are  explaining  for  the  sake  of  future  prediction.  Identification 
of  a sufficient  cause  means  that  the  event  will  happen. 
Identification  of  a necessary  cause  only  implies  that  it  might. 


4-3 


5.  FOOTNOTES 


1.  These  accounts  of  two  intricate  philosophies  of  science 
are  extraordinarily  abbreviated.  Additional  discussion  and 
fuller  references  may  be  found  in  Achinstein  (1977)  , Evered 
(1976),  or  Fischhoff  (1976). 

2.  These  deletions  included  4 of  Bear's  5 items.  For  the 
fifth,  there  was  a significant  disagreement  between  the  two 
groups  (Explanation  2 and  prospective  probabilities)  which 
most  closely  resembled  the  two  groups  which  agreed  in  this 
experiment. 
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